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Freshwater diatoms present an interesting challenge in an era when biodiversity is 
becoming a major concern. Although tremendously useful indicators of ecological 
conditions, past and present, lack of taxonomic knowledge limits the potential of eco¬ 
logical interpretation. At the same time the ecological studies that are carried out 
provide less than optimal feedback to the taxonomic literature. I suggest that appro¬ 
priate use of available computer-based technologies can integrate these fields to the 
benefit of both. I further outline the approaches taken in an early and primitive 
attempt to accomplish this goal, the benefits derived, and the mistakes made and 
inadequacies of our effort at that time. Thoughtful application of technologies now 
available has the potential to further integrate studies and expand eventual under¬ 
standing. 


The following is a discussion of an attempt to marry the fundamental approaches of systemat¬ 
ic practice to ecological studies. The tools and approaches used are, in retrospect, quite primitive, 
but there is an underlying logical framework that applies to all such endeavors, in taking on any 
problem at any time. I hope that discussing the way we attempted to solve problems common to 
any taxonomically based diatom study, what worked and what didn’t, and the mistakes made, will 
be of some value to current investigators. 

I should hasten to explain that the “we” in the previous paragraph is used advisedly. I am not 
a programmer, so much of the development and implementation was done by other people, better 
equipped to deal with the intricacies of programming than 1.1 thank the late Dr. Vincent Noble and 
Dr. Edward Johnston (Johnston and Stoermer 1976) for enlightening discussions of logical struc¬ 
tures appropriate for human — computer interactions. The initial programming was done by Dr. 
J.K.C. Huang and the system was brought to its most advanced state largely through the efforts of 
Theodore and Barbara Ladewski (Ladewski and Stoermer 1973; Sicko-Goad et al. 1977). Numer¬ 
ous helpful comments and suggestions were also made by many technical staff and students, which 
materially helped shape the project. 


The problem 

In the mid 1960s I was a young investigator faced with the rather intimidating problem of 
investigating the algal flora of the Laurentian Great Lakes. At the time, severe eutrophication prob¬ 
lems were apparent in many regions of these lakes (Beeton 1965, 1969). Because of the Great 
Lakes’ tremendous value to the economies of the United States and Canada, considerable resources 
were available for studies related to water quality. Many of the practical problems that beset the 
lakes at that time were directly related to algae. Taste and odor problems caused by diatoms in the 
spring (Vaughn 1961, 1962) and cyanophytes in the summer and fall (Stoermer and Stevenson 
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1980; Bierman and Dolan 1981; Stoermer and Theriot 1985). Cladophora was a nuisance in many 
regions of the lakes (Wolfe and Sweeney 1980) and generally unpleasant obnoxious conditions 
were present in many areas. Lake Erie, in particular, became a cause celebe of the environmental 
activism of the day, and was widely reported in the common press to be a “dead lake.” This was 
somewhat problematic to biologists, as the actual problem was over-production, which eventually 
led to de-oxygenation of the bottom waters in certain areas of the lakes, creating so called “dead 
zones” where benthic invertebrates were periodically exterminated. In retrospect, the problems of 
the 1960s and 1970s were only the most recent in a long history of environmental catastrophes, 
such as epidemics of water-borne diseases (Beatty 1982; Bonner 1991) that devastated communi¬ 
ties that drew drinking water from the lakes. For example, the great cholera epidemic of 1854 was 
estimated to have killed five percent of the total population of the city of Chicago. Collapse of 
native fish stocks began soon after western settlement of the region (Smith 1972), and culminated 
in total extermination of some native stocks by 1950 (Beeton 1969) and introduction of many exot¬ 
ic fish species. 

One would rationally suppose such a valuable, but clearly damaged, ecosystem would have 
received careful and comprehensive study, especially considering the large number of well-known 
academic institutions in the region. Unfortunately, this was not the case. The ecological history of 
the Great Lakes, in many respects, provides a sterling example of precisely the wrong way to 
approach management of a large and complex ecosystem. Each successive crisis generated a wave 
of “directed research” centered on the apparent problem and to a lesser extent, if at all, on its root 
causes. “Charismatic vertebrates,” in this case fish, were the initial center of attention, and lesser 
attention and resources were devoted to the rest of the biota or to chemical and physical factors of 
the environment. 

In the case of diatoms, early (in the North American context) exploratory studies were carried 
out by J.W. Bailey in 1839, first mentioned in 1842 (Bailey 1842a, 1842b), and sent to C.G. 
Ehrenberg, who more formally published them in his monumental works (Ehrenberg 1845, 1854). 
These collections are still maintained at the Museum ftir Naturkunde, Humboldt-Universitat zu 
Berlin, and have been used in more recent studies of the Great Lakes diatom flora (Stoermer and 
Ladewski 1982). Early pollution studies, particularly in the area of Chicago (Thomas and Chase 
1887) and Cleveland (Vorce 1881, 1882) produced collections which are still available, but the 
majority of taxonomic work undertaken was either un-vouchered, or the material resulting from the 
study has been lost. For example, studies on early fisheries declines included some work on 
diatoms (e.g.. Ward 1896; Thompson 1896) but we have never been able to locate any of these col¬ 
lections. 

Thus, from the beginning it was apparent that the type of supporting references and materials 
generally assumed to be available to ecological studies were lacking. Although this problem is 
obvious in the Great Lakes case, it applies to the majority of studies attempting to use diatoms as 
ecological indicators, as I have argued elsewhere (Stoermer 2001). 

Approach 

Collections 

Early on I determined that it was absolutely necessary to maintain a consistent and reasonably 
well ordered reference collection. It was clear that the available literature of the time was grossly 
insufficient to support repeatable identifications, so the availability of a reference standard was 
essential. Maintenance of vouchers, once a routine part of good scientific practice, has largely been 
abandoned in ecological studies. Logically, it is still necessary for studies involving lesser-known 
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organism groups, and certainly should be a requirement for studies involving diatoms. It is some¬ 
times argued that maintaining collections is “too expensive” for the competitive world of ecologi¬ 
cal funding. In a reasonable and logical world the functions of developing a comprehensive taxon¬ 
omy might be separated, as they are in most large organisms, but this was not the case at the time 
I began. Although it has become much easier in recent years, due to general recognition of the bio¬ 
diversity crisis, in the 1960s and 1970s it was virtually impossible to obtain direct funding for tax¬ 
onomic studies of microscopic eukaryotes. 

In our case, I simply made the decision that studies from our lab would be supported by vouch¬ 
ers, as a minimum standard of scientific practice. Our collections are in the form of lots, numbered 
consecutively. Each lot consists of raw material, cleaned material, and one or more slides. In some 
cases, we have accepted slides from other investigators and integrated them into the collection 
without other material, but this is a compromise to be avoided if at all possible. Because we oper¬ 
ated primarily from ships, locality information consists of latitude and longitude and brief habitat 
and collection method descriptors. With the current availability of global positioning system (GPS) 
apparatus, there is now no excuse not to substitute this unambiguous information for references to 
inconstant physical landmarks and place names. In the better systems available, it is also possible 
to directly transcribe information electronically, avoiding the inevitable mistakes introduced by 
hand transcription. 

Index and Pictorial Reference 


UrtJhora 


When working on a system such as the Great Lakes it is easy to escape the illusion that appro¬ 
priate names for all diatoms encountered exist in the literature, or the equally pernicious assump¬ 
tion that all names in the literature reflect biological reality. For that reason, we have always treat¬ 
ed diatom names as entirely arbitrary. Thus, a nomenclaturally correct binomial is quite acceptable 
but, in our system, an arbitrary name (e.g., aff. Navicula cimbigua) or a numerical designation (e.g., 
Nitzschia 343) is equally acceptable, if it is supported by an adequate illustration and voucher 
specimen. This, of course, is a compromise, recognizing the fact that it is not possible to resolve all 
taxonomic questions while conducting ecological studies, which furnished support for our lab at 
the time the system was instituted. In order to keep internal consistency, but avoid the extra time 
and effort necessary to directly compare specimens under a microscope, we resorted to a photo¬ 
graphic archive. An illustration of the file used is shown in Figure 1. The elements are an epithet 
(upper left), one or more photo¬ 
graphs (upper right), the dimen¬ 
sions of the specimen(s) (center) 
and coordinates of their location 
on a slide (in parentheses) taken 
from a particular microscope 
indicated by the letter following. 

Pictorial representations of spec¬ 
imens circled on a slide, and 
location of specimen(s) within a 
particular circle (lower left), 
photo magnification (lower cen¬ 
ter) and the collection number 
(lower right) are also provided. 

In our original system, additional 
notes were written on the back of 





Figure 1 . Example of card image used for specimen location and identifi¬ 
cation. See text for explanation. 
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the card (Fig. 2). More than one 
card could be used to illustrate 
morphological variation and size 
series of any given entity (Fig. 
3). Of course this is all very 
primitive, given the current 
availability of excellent databas¬ 
es that easily incorporate such 
information and are very easy to 
use. An example is the File- 
maker™ template developed by 
Joynt (Joynt and Wolfe 1999) 
that can incorporate all these fea¬ 
tures and considerably more. The 
really important aspect of using 
such a system, rather than relying 
entirely on the published litera¬ 
ture is that it allow one to follow 
the dictum of “when in doubt, 
sort it out.” In the case of the 
Great Lakes, it was obvious that 
many “common species” had dif¬ 
ferent morphotypes that had sep¬ 
arate distribution patterns 
(Pappas and Stoermer 2001), and 
likely were genetically separate 
entities. Although separation of 
taxa on minor morphological 
variations might seem risky, in 
terms of supporting ecological 
interpretation, it is vastly less 
destructive than under-classifica¬ 
tion (Birks 1994). In fact, most 
multivariate statistical techniques 
false separations. 
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Figure 2. Notes from reverse of card shown in Figure 1. Because Amphora 
calumetica is relatively rare, emphasis is on locating a range of specimens. 
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Figure 3. Example of an ancillary card, showing largest specimen of A. 
calumetica found at the time. 

will, given that identification is consistent, merely re-aggregate 

Computerization 


In our case, computerization began as a simple data analysis problem. When handling large 
data sets, verification and data integrity are always problems, and ones that humans seem to han¬ 
dle poorly. Remembering these were the days when computer memories were limited and storage 
devices primitive. We had quite a struggle with programmers to use names recognizable to humans, 
and let the computer do the lookup, rather than simplifying the programmer’s task by using a sim¬ 
ple sequential list of taxa. Although this seems trivial in the modern context, I think there is an 
important lesson. Let computers do the simple, purely logical tasks. Save the human ability to deal 
with more complex tasks, perhaps aided by calculating engines, for the hard parts. 

From this humble beginning, we, largely through the efforts of Theodore and Barbara 
Ladewski, were able to develop in integrated database system useful to both taxonomy and ecolo- 
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gy. The program’s name, through 
its several incarnations, was 
FIDO (a programmer’s play on 
the word “phyto”). It consisted of 
the following elements: 

Masterlist — A list of all 
acceptable names. These could 
be in the form of proper Latin 
binomials, binomials of conven¬ 
ience, or simple numerical or 
other arbitrary designation. The 
important part was that in order 
to become part of Masterlist, any 
designation had to be supported 
by a marked specimen in the col¬ 
lection and a photographic illus¬ 
tration in the master card file. Of 
course, all of these functions can 
be incorporated in any modern 
database. An sample portion is 
shown in Figure 4. 

Deckcheck — a subpro¬ 
gram that checked all entries for 
codes not acceptable to 
Masterlist (coding violations, 
misspellings, etc.) and “suspi¬ 
cious” data. I am surprised at 
how few current databases 
include extended data verifica¬ 
tion protocols. It is our experi- 
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Figure 4. A fragment of MASTERLIST printed in the late 1970s. Reading 
from the left, identity code, a major group and habitat code, two columns of 


ence that an appreciable error numerical book keeping codes used by the program, and accepted epithets. At 
rale is associated with human present, only about 20% of arbitrary numerical designations have been identi- 
, , . fied with described species, 

data entry and review, no matter 


how careful the analyst or transcriber, and many of these can be detected by fairly simple data 


screening protocols. 

Tapeit — A subprogram that wrote files for further processing and a separate permanent 


archive. 


Fetch — A subprogram that retrieved data from the archive, either as hardcopy with summa¬ 
ry statistics (subprogram ANALYZE) or output for further manipulation. An example of the former 
is shown in Figure 5. Note that summary statistics are calculated, including error estimates on 
counts. A separate, parallel-running system was used to collect and process chemical and physical 
data. This system was structured similarly to FIDO, which made merging of the databases for 
analysis relatively simple (Fig. 6). Examples of further manipulations include such things as dis¬ 
tribution maps (Fig. 7) and representations of community structure based on multivariate statisti¬ 
cal analyses (Figs. 8 and 9). 

In the discussion above readers will note that almost all the design criteria were motivated by 
trying to bring some sort of modern taxonomic understanding to relatively large scale ecological 
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Southern Lake Michigan, August 1971 
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Figure 5. Example of ANALYZE output taken from a study of whole phytoplankton (diatoms and other groups) in 
southern Lake Michigan in 1971. Raw data are shown in right hand columns. Summarized data are shown in left columns. 
The large “undetermined” category consists mostly of microflagellates that cannot be satisfactorily identified with light 
microscopy. 


projects, lacking the sort of traditional floristic 
and monographic support generally assumed. 
Perhaps more importantly, once our national 
science funding establishment began to awake 
to the fact that we are living in an ecosystem 
that is probably less than 20% described, this 
type of data base made it possible to attack 
some real taxonomic problems, particularly of 
the Great Lakes region (e.g., Theriot and 
Stoermer 1984, 1986). 

Mistakes and Problems 

In retrospect, it is nearly always possible to 
identify mistaken directions and things that 
should have been done differently. In our case 
the worst problems were partially our own fault 
and partially due to faults in the system. Part of 
the problem was that we started early in the 
game. Many diatomists resisted computer appli¬ 
cations when they first became available. On 
the other hand, the funding agencies we dealt 



Figure 6. Example of data plotted from the study cited 
in Figure 5, in this case the absolute abundance of 
Stephanodiscus binderanus (Kiitz.) Krieg relative to tem¬ 
perature (Stoermer and Ladewski 1976). Curve is fitted to 
data envelope and estimates of maximum abundance (M) 
and dispersion (S) are derived. Anomalous appearing points 
on the right come from inshore stations in the fall when pop¬ 
ulations are injected into the still warm lake from more rap¬ 
idly cooling streams. 
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with at the time were reluctant to 
provide support dedicated to 
database development at the 
local project level. Some spent 
inordinate amounts of money on 
commercial database develop¬ 
ment, but most of these were put 
together with minimal inputs 
from the user community and, 
although they might have incor¬ 
porated the latest programming 
tricks of the time, were hideously 
clumsy and inefficient to use. It 
has been my observation that 
most really useful databases 
incorporate a high level of spe¬ 
cific user input, and most really 
successful programs are locally 
developed. Since computeriza¬ 
tion has become popular our 
national funding agencies have 
devoted considerable resources 
to development of several gener¬ 
ations of biological databases, 
but most of this effort has gone to 
generalized systems that are not 

particularly appropriate for the problems faced by diatomists. 

Part of the problem is the structure and economics of the computer industry. The very rapid 
expansion of computing power (Moore’s Law) causes rapid obsolescence in microcomputers, a 
trend that the industry has capitalized on. It must also be said that University administrations, at 
least in this country, have been alert to the fact that the cost of centralized mainframe computer sys¬ 
tems usually becomes their responsibility, whereas much of the cost of decentralized systems falls 
on Departments, or individual investigators. It is also a truism that the quickest way for a software 
company to go broke is to design a perfect product. It is economically much more rewarding to 
design something marginally adequate that can continue to be upgraded. All of this militates 
against development of a stable continuing system, and makes upgrading of a developed system 
very difficult, in that most resources are devoted to exploiting “exciting” new technologies, rather 
than adapting existing databases to them as they arise. 

In the case of our system described above, we eventually became victims of the technology 
transition. FIDO was much more complete and easy to use than any of the early microcomputer 
database programs, and we continued to use it well past the transition from mainframe-based to a 
microcomputer-based network system. We were unable to obtain support for conversion from 
either local or national funding sources, so much of the data accumulated during this era exists only 
on hardcopy and tapes that are rapidly becoming unreadable. Part of the reason for this was that we 
were somewhat too clever in using “latest technologies” of the day that were specific to the 
University of Michigan mainframe computer system. 

Perhaps the “take home” message for independent laboratories is to develop and use the sim- 


Figure 7. Example of species data plotted from a similar study. 
Distribution of S. binderanus in Lake Ontario in the spring of 1972. In the 
lower image, actual numerical values are given at the top of bars when values 
are too large to conveniently plot at scale used (from Stoermer et al. 1974). 
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plest system adequate to your 
specific needs, and upgrade and 
maintain it judiciously. Although 
the latest and greatest in technol¬ 
ogy is always attractive, pio¬ 
neers in technology areas often 
suffer different, but equally 
painful, slings and arrows as did 
the geographic pioneers of past 
centuries. In this regard, I think 
the “open software” movement 
offers great promise. 

Present and Future 
Considerations 

And I continue to feel that 
computer assisted approaches 
offer the best avenue for “marry¬ 
ing” the needs of taxonomists 
and ecologists. As I have dis¬ 
cussed elsewhere (Stoermer 
2001) it is foolish for ecologists 
to expect taxonomic treatises on 
diatoms of the type generally 
available for “higher” organisms 
to become available in the fore¬ 
seeable future. This being the 
case, it is really necessary to incorporate good taxonomic practice into routine analytical work and 
assure that project outputs are useful to people whose primary interests are in taxonomy and sys- 
tematics. At the same time, it behooves the few people in the latter category to be more proactive 
in addressing the resources potentially available from ecological studies. 

At present, it is quite feasible for workstations used in diatom analysis to capture and maintain 
not only the analysts 7 taxonomic decisions, but also images of exemplar specimens such decisions 
are based on, the pertinent locality information, and the precise location on a slide of each speci¬ 
men assigned to a given category. At the same time, the analyst should be able to address taxonom¬ 
ic information and identification aids, such as image analysis, directly and in real time. 

Whereas the digital tools now available offer exciting possibilities, they also present some real 
challenges and dangers. The possibilities for enhanced data display and sharing make the possibil¬ 
ity of “consensus floras” more attractive. Although this may be useful, and indeed necessary, in the 
context of a particular ecological project, such efforts can easily degenerate into lowest common 
denominator solutions that actually retard scientific progress in the general field, rather than 
advancing it. Diatomists are in a particularly difficult situation in this regard. Taxonomic informa¬ 
tion in our field is virtually exploding, but most funding agencies, both those traditionally support¬ 
ing ecological research and those supporting taxonomic tend to take large organisms as their model 
for understanding diversity. Even at this level, there is no logical expectation of ever establishing 
a truly “stable” taxonomic system unless we are willing to freeze knowledge in some imperfect 



Figure 8. Representation of phytoplankton community structure in south¬ 
ern Lake Huron based on samples taken 4-8 June 1974 under west wind forc¬ 
ing. Associations were determined using dimensional ordination and principal 
components analysis (from Stoermer and Kreis 1980). Materials and phyto¬ 
plankton from badly polluted Saginaw Bay are entrained by the spring thermal 
bar and, combined with other local shoreline sources, generate “eutrophic” 
associations in the western portion of the lake. Mostly agricultural and minor 
industrial sources from the Canadian shore, also entrained by the spring ther¬ 
mal bar, produce more “mesotrophic” associations in the eastern portion of the 
lake. The oligotrophic associations expected in a large lake of this type are only 
found in the offshore waters. 
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state. In the case of diatoms, the 
present state is grossly imperfect 
and the expectation of stability 
is demonstrably unscientific. 
Given that there are snares and 
pitfalls to be avoided, currently 
available technologies offer 
those bold and resourceful 
enough to utilize them great pos¬ 
sibilities. These range from 
purely exploratory — we are 
still in the era where simple dis¬ 
covery and description probably 
advances the field more than any 
other approach — to application 
and incorporation of available 
tools for taxonomic and ecologi¬ 
cal questions. 



Figure 9. Representation of phytoplankton associations from the same 
study shown in Figure 8. In this case, data were collected 26-31 August under 
east wind forcing. A large upwelling has occurred in the eastern region of the 
lake. This combined with local shoreline sources results in atypical phytoplank¬ 
ton associations in the eastern nearshore region. The extent of nutrient re-sup- 
ply also causes somewhat atypical summer associations in most of the southern 
portion of the lake, and these communities intrude into Saginaw Bay, as the 
expected eutrophic communities are transported northward along the Michigan 
(western) shore. The expected offshore “oligotrophic” summer phytoplankton 
association is only found at a few stations in the north-central quarter. 
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